In this analysis, there are 150 subjects and 1205 observations. Mean (range) of follow-up time per patient was 1 (0, 2) years. Mean (range) number of observations was 8 (1, 29). The same number of patients and observations were utilized in each model below. We assume for each model that the data are missing at random, except for the joint longitudinal-survival model. The dependent variable is GLI FEV1 (% predicted) named “y” in the output below. The time scale is patients’ age at visit. Age is expressed in years. Age at CF diagnosis (in years) was included as a covariate in each model. Below are summary statistics for each variable used in the model. Once a patient was recorded as having a lung transplant, his or her data were censored from the model.
## y Age Age_Diag_per_patient
## Min. : 10.60 Min. : 6.01 Min. : 0.000
## 1st Qu.: 43.57 1st Qu.:10.94 1st Qu.: 0.100
## Median : 73.99 Median :18.67 Median : 0.300
## Mean : 70.57 Mean :19.47 Mean : 2.004
## 3rd Qu.: 94.66 3rd Qu.:24.54 3rd Qu.: 1.700
## Max. :131.21 Max. :52.12 Max. :17.500
The Kaplan-Meier (KM) survival curve for those who died or received a lung transplant is plotted below showing the survival probability over age at visit (top graphic) and the number still alive/without lung transplant (bottom graphic).
For this scenario, we have 18 subjects (out of total 150) who subjects who died or received lung transplant over ages. The event rate is around 0.12 or 12%. The minimum time is 12.5 years old and maximum is 51 years old. All 18 subjects have different event time.
In the following report, we investigate three commonly used structures in CF FEV1 analysis. These include the linear mixed-effects model (LME), Generalized Estimating Equations (GEE) and joint modelling (JM). For the LME models, we include different combinations of random effects, splines and correlation structures as used in published CF studies (Section 2). We examine published GEE models combined with additional features for linearity/non-linearity (Section 3). We fit a joint model with the linear/non-linear structure from the LME models in Section 2.
Average evolution: A linear evolution is assumed over time.
Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have the same evolution over time.
## [1] "AIC = 9292"
## [1] "BIC = 9318"
| StdDev | |
|---|---|
| (Intercept) | 22.589837 |
| Residual | 9.014941 |
| Value | Std.Error | DF | t-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | 104.044 | 3.939 | 1054 | 26.410 | 0.000 |
| Age | -1.589 | 0.184 | 1054 | -8.620 | 0.000 |
| Age_Diag | 0.789 | 0.525 | 148 | 1.504 | 0.135 |
As the coefficient estimate for ‘Age’ is around -1.589 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.589 (unit % predicted).
We can calculate the population-level evolution (left) and rate of change (right) over time. As we have assumed a linear evolution, the left figure shows a straight line, corresponding to a linear estimated fit. The rate of change is calculated by taking first derivative of the model equation over time. This model estimates the rate of change in FEV1 as constant over different ages as -1.589.
We examined standardized residuals in terms of randomness, spread and nature of distribution.
The residuals vs fitted plot (upper left): residuals are randomly distributed over the fitted values and are clustered around 0. But we notice there are several points are apart from 0, (i.e. one point at residual round 10 and fitted value around 100). This could be due to the unbalance distribution of FEV1 or the extreme value in data.
The residuals vs age (upper right): These are randomly scattered over time (age) and the spread of the points is slightly decreasing over time. The assumption of homogeneity for the residuals of variance is roughly hold. However, we also notice few points are apart from the 0 horizontal line.
The quantile-quantile (QQ) plot (lower left): We used the QQ plot to check the normality assumption, which compares quantiles of the standardized residuals to those from a standard normal distribution. Having all points along the diagonal line (red line in plot) would suggest that the residuals are approximately normally distributed. However, in this QQ plot, there is evidence that the distribution is heavier tailed where this feature is sometimes evident when modeling longitudinal data.
The density of residuals (lower right): indicates the distribution of the residuals is slightly left skewed. These diagnostics show the model is mainly valid, but further investigation may be needed to account for the fact that the variance decreases over time and to accommodate the heavier tails.
Average evolution: A linear evolution is assumed over time.
Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have different evolution over time.
## [1] "AIC = 9287"
## [1] "BIC = 9323"
| StdDev | |
|---|---|
| (Intercept) | 14.199824 |
| Age | 0.589098 |
| Residual | 8.990464 |
| Value | Std.Error | DF | t-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | 105.611 | 3.578 | 1054 | 29.516 | 0.000 |
| Age | -1.677 | 0.200 | 1054 | -8.386 | 0.000 |
| Age_Diag | 0.770 | 0.570 | 148 | 1.351 | 0.179 |
As the estimation of coefficient for ‘Age’ is around -1.677 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.677 (unit % predicted).
The model diagnostic plots are below and they are similar to the previous pattern.
Average evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time.
Patient-specific evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time. A diagonal variance-covariance matrix for the random effects are assumed. This means that the random effects are not correlated.
The knots, listed below, were chosen according to a recursive algorithm (reference: Spline Regression with Automatic Knot Selection, Vivien Goepp (2018)). There are 6 knots in total selected for the entire dataset (age: 6 years and older).
## [1] "Knots location (at age): 9.8, 13.5, 17.3, 21.4, 31.3"
## [1] "AIC = 9237"
## [1] "BIC = 9319"
| StdDev | |
|---|---|
| (Intercept) | 16.218884 |
| ns1 | 20.607789 |
| ns2 | 17.685630 |
| ns3 | 29.868936 |
| ns4 | 31.471322 |
| ns5 | 13.378872 |
| ns6 | 66.527389 |
| Residual | 8.436234 |
| Value | Std.Error | DF | t-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | 99.142 | 3.815 | 1049 | 25.985 | 0.000 |
| ns1 | -13.274 | 6.205 | 1049 | -2.139 | 0.033 |
| ns2 | -19.625 | 6.127 | 1049 | -3.203 | 0.001 |
| ns3 | -41.742 | 6.972 | 1049 | -5.987 | 0.000 |
| ns4 | -27.559 | 18.199 | 1049 | -1.514 | 0.130 |
| ns5 | -127.423 | 30.227 | 1049 | -4.215 | 0.000 |
| ns6 | -171.810 | 63.189 | 1049 | -2.719 | 0.007 |
| Age_Diag | 0.885 | 0.571 | 148 | 1.551 | 0.123 |
As non-linear evolution over time is assumed, the population-level evolution (left) and rate of change (right) are varying over time. The population-level trajectory is slightly curved during the age range 6 to 30 years old and is linearly decreasing afterward. The rate of change plot shows a fast change at earlier age (around 30 years old) with a lowest decline occurred around age 20 years old.
Average evolution: A linear mean evolution is assumed over time.
Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have the same evolution over time. Correlation structure: exponential correlation function. This means that, for each subject, the correlation of FEV1 exponentially decays to zero with the distance between times (ages).
## [1] "AIC = 9216"
## [1] "BIC = 9252"
| StdDev | |
|---|---|
| (Intercept) | 0.04146124 |
| Residual | 24.35877545 |
## Correlation Structure: Exponential spatial correlation
## Formula: ~Age | eDWID
## Parameter estimate(s):
## range nugget
## 19.6114439 0.1043369
| Value | Std.Error | DF | t-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | 103.294 | 4.047 | 1054 | 25.525 | 0.000 |
| Age | -1.539 | 0.191 | 1054 | -8.041 | 0.000 |
| Age_Diag | 0.736 | 0.528 | 148 | 1.395 | 0.165 |
As the estimation of the coefficient for ‘Age’ is around -1.539 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.539 (unit % predicted).
The residuals vs fitted plot (upper left): there is a decreasing trend in variability of residuals over the fitted value which might due to the unbalance of the response variable FEV1.
The residuals vs age (upper right): The spread of the scatter is decreasing over time and all points are roughly clustered around 0 and roughly symmetric.
The quantile-quantile (QQ) plot (lower left): is slightly heavier tailed, but is better than previous models.
The density of residuals (lower right): has a heavier right tail and not symmetric. Further investigation may be needed.
Average evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time.
Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point
Correlation structure: exponential correlation function. This means that, for each subject, the correlation of FEV1 exponentially decays to zero with the distance between times (age).
## [1] "Knots location (at age): 9.8, 13.5, 17.3, 21.4, 31.3"
## [1] "AIC = 9219"
## [1] "BIC = 9280"
| StdDev | |
|---|---|
| (Intercept) | 0.05198012 |
| Residual | 24.10691172 |
## Correlation Structure: Exponential spatial correlation
## Formula: ~Age | eDWID
## Parameter estimate(s):
## range nugget
## 19.8375331 0.1067286
| Value | Std.Error | DF | t-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | 99.410 | 4.917 | 1049 | 20.217 | 0.000 |
| ns1 | -12.826 | 7.291 | 1049 | -1.759 | 0.079 |
| ns2 | -18.860 | 7.274 | 1049 | -2.593 | 0.010 |
| ns3 | -42.782 | 7.441 | 1049 | -5.749 | 0.000 |
| ns4 | -26.507 | 18.945 | 1049 | -1.399 | 0.162 |
| ns5 | -119.501 | 30.896 | 1049 | -3.868 | 0.000 |
| ns6 | -157.231 | 64.642 | 1049 | -2.432 | 0.015 |
| Age_Diag | 0.672 | 0.531 | 148 | 1.267 | 0.207 |
Compared with LME models, GEE models allow for weaker distributional assumptions and are more robust to covariance misspecification. In this analysis, the average evolution (fixed-effects) structure in the GEE model was selected based on the performance of all LME models. This GEE model is fitted with a linear structure to model the mean evolution and accounts for longitudinal correlation using the sandwich estimator.
Average evolution: A linear evolution (assuming natural cubic splines) is used over time.
Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point.
Correlation structure: independence (working) correlation function.
| Estimate | Std.err | Wald | Pr(>|W|) | |
|---|---|---|---|---|
| (Intercept) | 100.250 | 4.208 | 567.622 | 0.000 |
| Age | -1.596 | 0.176 | 82.669 | 0.000 |
| Age_Diag | 0.810 | 0.599 | 1.830 | 0.176 |
| Estimate | Std.err | |
|---|---|---|
| (Intercept) | 599.319 | 55.692 |
In this analysis, the GEE mode is fitted with a spline structure to model the mean evolution and includes an independence correlation.
Average evolution: A non-linear evolution (assuming cubic natural splines) is used over time.
Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point.
Correlation structure: independence correlation function (sandwich estimator used to account for longitudinal correlation).
| Estimate | Std.err | Wald | Pr(>|W|) | |
|---|---|---|---|---|
| (Intercept) | 91.492 | 8.997 | 103.401 | 0.000 |
| ns1 | -3.853 | 13.751 | 0.078 | 0.779 |
| ns2 | -15.036 | 13.746 | 1.196 | 0.274 |
| ns3 | -45.308 | 11.940 | 14.399 | 0.000 |
| ns4 | -17.317 | 23.482 | 0.544 | 0.461 |
| ns5 | -108.001 | 33.368 | 10.476 | 0.001 |
| ns6 | -168.365 | 64.493 | 6.815 | 0.009 |
| Age_Diag | 0.660 | 0.530 | 1.549 | 0.213 |
| Estimate | Std.err | |
|---|---|---|
| (Intercept) | 569.361 | 56.561 |
The quantile-quantile (QQ) plot (lower left): The Q-Q plot shows that residuals approximate the unit normal distribution, which suggests that the relaxed assumptions in GEE model alleviate issues with heavy tails. The density of residuals (lower right): has a bit heavier left tail, but roughly symmetric.
Missing data due to death or lung transplant is accounted for by jointly modeling the FEV1 process with the survival process using a Weibull model for survival. We use a Weibull baseline risk function to fit the baseline hazard in the survival sub-model. This joint model was implemented using the shared parameter framework. The shared parameters here corresponded to connecting the survival outcome with the underlying value of FEV1 and rate of change in FEV1 over time.
We assume that FEV1 and survival are associated over time (age). To fit the longitudinal sub-model for FEV1, we apply the LME model with linear assumption; for survival, we use the Weibull survival model.
Average evolution of FEV1: A linear evolution is assumed over time.
Patient-specific evolution of FEV1: A linear evolution is assumed over time and each patient may start at different time point (and have the same evolution).
Survival Baseline hazard: A Weibull baseline risk hazard is assumed over time at baseline.
Association: We assume the underlying value of FEV1 and rate of change over time to be associated with survival.
## [1] "AIC = 9231"
## [1] "BIC = 9267"
The variance components for FEV1 include the standard deviations of the random intercept as “(Intercept)”, slope as “Age” and residual as “Residual” reported below.
| StdDev | |
|---|---|
| (Intercept) | 12.429 |
| Age | 0.496 |
| Residual | 8.696 |
| Value | Std.Err | z-value | p-value | |
|---|---|---|---|---|
| (Intercept) | 106.202 | 0.605 | 175.524 | 0 |
| Age | -1.688 | 0.028 | -59.618 | 0 |
| Age_Diag | 0.697 | 0.081 | 8.635 | 0 |
| Value | p-value | Hazard Ratio | Lower | Upper | |
|---|---|---|---|---|---|
| (Intercept) | -81.815 | 0.017 | 0.000 | 0.000 | 0.000 |
| Age_Diag | -0.273 | 0.012 | 0.761 | 0.615 | 0.943 |
| Assoct | 0.153 | 0.142 | 1.165 | 0.950 | 1.428 |
| Assoct.s | -11.658 | 0.023 | 0.000 | 0.000 | 0.198 |
| log(shape) | 2.698 | 0.000 | 14.850 | 6.883 | 32.038 |
Average evolution of FEV1: A non-linear evolution by cubic natural spline is assumed over time.
Patient-specific evolution of FEV1: A linear evolution is assumed over time and each patient may start at different time point (and have the same evolution).
Survival Baseline hazard: A Weibull baseline risk hazard is assumed over time at baseline.
Association: We assume the underlying value of FEV1 and rate of change over time to be associated with survival.
## [1] "AIC = 9435"
## [1] "BIC = 9486"
| StdDev | |
|---|---|
| (Intercept) | 14.060 |
| Age | 0.652 |
| Residual | 8.953 |
| Value | Std.Err | z-value | p-value | |
|---|---|---|---|---|
| (Intercept) | 99.181 | 3.616 | 27.431 | 0.000 |
| ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))1 | -15.190 | 5.064 | -3.000 | 0.003 |
| ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))2 | -12.979 | 5.194 | -2.499 | 0.012 |
| ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))3 | -43.076 | 5.898 | -7.304 | 0.000 |
| ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))4 | -28.549 | 8.422 | -3.390 | 0.001 |
| ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))5 | -79.941 | 7.905 | -10.112 | 0.000 |
| ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))6 | -81.521 | 4.657 | -17.504 | 0.000 |
| Age_Diag | 0.749 | 0.538 | 1.394 | 0.163 |
| Value | p-value | Hazard Ratio | Lower | Upper | |
|---|---|---|---|---|---|
| (Intercept) | -5.094 | 0.264 | 0.006 | 0.000 | 47.029 |
| Age_Diag | -0.132 | 0.132 | 0.876 | 0.738 | 1.041 |
| Assoct | -0.096 | 0.000 | 0.909 | 0.871 | 0.947 |
| Assoct.s | 0.626 | 0.065 | 1.870 | 0.961 | 3.637 |
| log(shape) | 1.039 | 0.011 | 2.826 | 1.270 | 6.289 |
| Name | Specification |
|---|---|
| Int | Linear evolution with random intercept |
| Int + Slope | Linear evolution with random intercept and slope |
| Spline | Non-linear evolution by using spline as both average evolution and random effect |
| Int + Corr | Linear evolution with random intercept and exponential correlation structure |
| Spline + Corr | Non-linear evolution with spline as average evolution, random intercept and exponential correlation structure |
| Lin_GEE | Linear evolution with independence correlation structure |
| Nonlin_GEE | Non-linear evolution by using spline with independence correlation structure |
| Lin_JM | Linear evolution with jointly model FEV and survival |
| Nonlin_JM | Non-linear evolution with jointly model FEV and survival |
To check more closely, we zoom in the result for age 6 to 30.